198 research outputs found
Learning Individual Policies in Large Multi-agent Systems through Local Variance Minimization
In multi-agent systems with large number of agents, typically the
contribution of each agent to the value of other agents is minimal (e.g.,
aggregation systems such as Uber, Deliveroo). In this paper, we consider such
multi-agent systems where each agent is self-interested and takes a sequence of
decisions and represent them as a Stochastic Non-atomic Congestion Game (SNCG).
We derive key properties for equilibrium solutions in SNCG model with
non-atomic and also nearly non-atomic agents. With those key equilibrium
properties, we provide a novel Multi-Agent Reinforcement Learning (MARL)
mechanism that minimizes variance across values of agents in the same state. To
demonstrate the utility of this new mechanism, we provide detailed results on a
real-world taxi dataset and also a generic simulator for aggregation systems.
We show that our approach reduces the variance in revenues earned by taxi
drivers, while still providing higher joint revenues than leading approaches.Comment: arXiv admin note: substantial text overlap with arXiv:2003.0708
Transferable Curricula through Difficulty Conditioned Generators
Advancements in reinforcement learning (RL) have demonstrated superhuman
performance in complex tasks such as Starcraft, Go, Chess etc. However,
knowledge transfer from Artificial "Experts" to humans remain a significant
challenge. A promising avenue for such transfer would be the use of curricula.
Recent methods in curricula generation focuses on training RL agents
efficiently, yet such methods rely on surrogate measures to track student
progress, and are not suited for training robots in the real world (or more
ambitiously humans). In this paper, we introduce a method named Parameterized
Environment Response Model (PERM) that shows promising results in training RL
agents in parameterized environments. Inspired by Item Response Theory, PERM
seeks to model difficulty of environments and ability of RL agents directly.
Given that RL agents and humans are trained more efficiently under the "zone of
proximal development", our method generates a curriculum by matching the
difficulty of an environment to the current ability of the student. In
addition, PERM can be trained offline and does not employ non-stationary
measures of student ability, making it suitable for transfer between students.
We demonstrate PERM's ability to represent the environment parameter space, and
training with RL agents with PERM produces a strong performance in
deterministic environments. Lastly, we show that our method is transferable
between students, without any sacrifice in training quality.Comment: IJCAI'2
ZAC: A Zone pAth Construction approach for effective real-time ridesharing
National Research Foundation (NRF) Singapore under SMART and Future Mobilit
Diversity Induced Environment Design via Self-Play
Recent work on designing an appropriate distribution of environments has
shown promise for training effective generally capable agents. Its success is
partly because of a form of adaptive curriculum learning that generates
environment instances (or levels) at the frontier of the agent's capabilities.
However, such an environment design framework often struggles to find effective
levels in challenging design spaces and requires costly interactions with the
environment. In this paper, we aim to introduce diversity in the Unsupervised
Environment Design (UED) framework. Specifically, we propose a task-agnostic
method to identify observed/hidden states that are representative of a given
level. The outcome of this method is then utilized to characterize the
diversity between two levels, which as we show can be crucial to effective
performance. In addition, to improve sampling efficiency, we incorporate the
self-play technique that allows the environment generator to automatically
generate environments that are of great benefit to the training agent.
Quantitatively, our approach, Diversity-induced Environment Design via
Self-Play (DivSP), shows compelling performance over existing methods
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
On-demand ride-pooling (e.g., UberPool) has recently become popular because
of its ability to lower costs for passengers while simultaneously increasing
revenue for drivers and aggregation companies. Unlike in Taxi on Demand (ToD)
services -- where a vehicle is only assigned one passenger at a time -- in
on-demand ride-pooling, each (possibly partially filled) vehicle can be
assigned a group of passenger requests with multiple different origin and
destination pairs. To ensure near real-time response, existing solutions to the
real-time ride-pooling problem are myopic in that they optimise the objective
(e.g., maximise the number of passengers served) for the current time step
without considering its effect on future assignments. This is because even a
myopic assignment in ride-pooling involves considering what combinations of
passenger requests that can be assigned to vehicles, which adds a layer of
combinatorial complexity to the ToD problem.
A popular approach that addresses the limitations of myopic assignments in
ToD problems is Approximate Dynamic Programming (ADP). Existing ADP methods for
ToD can only handle Linear Program (LP) based assignments, however, while the
assignment problem in ride-pooling requires an Integer Linear Program (ILP)
with bad LP relaxations. To this end, our key technical contribution is in
providing a general ADP method that can learn from ILP-based assignments.
Additionally, we handle the extra combinatorial complexity from combinations of
passenger requests by using a Neural Network based approximate value function
and show a connection to Deep Reinforcement Learning that allows us to learn
this value-function with increased stability and sample-efficiency. We show
that our approach outperforms past approaches on a real-world dataset by up to
16%, a significant improvement in city-scale transportation problems.Comment: Accepted for publication to the Thirty-Fourth AAAI Conference on
Artificial Intelligence (AAAI-20
- …